feat: external MCP connectors via AgentCore Identity by philmerrell · Pull Request #174 · Boise-State-Development/agentcore-public-stack

philmerrell · 2026-04-22T23:27:00Z

Summary

Replaces the in-house OAuth flow for external MCP tools with AgentCore Identity end-to-end, and switches the per-turn consent gate from a pre-flight short-circuit to mid-turn Strands interrupts so users don't have to retype their prompt after authorizing.

Admin path — provider CRUD now lives in AgentCore (CreateOauth2CredentialProvider + friends); our DynamoDB record keeps display/RBAC metadata only.
User path — new Settings → Connectors page lets users initiate / re-consent providers; /oauth-complete calls CompleteResourceTokenAuth before notifying the opener so the vault doesn't stay empty.
Agent path — OAuthConsentHook (BeforeToolCallEvent + AfterToolCallEvent) gates tool execution. First call → AgentCore vault hit or interrupt with consent URL. Stale token after a provider-side revoke → AfterToolCallEvent detects the 401, marks the user/provider for force_authentication, sets retry=True, and the next BeforeToolCallEvent raises a fresh interrupt.
Resume protocol — oauth_required SSE events now carry interruptId; the frontend snapshots the last turn's payload and replays it with interrupt_responses after the popup closes, so the agent picks up the same turn without a retype.

Test plan

Admin: register + edit + delete an OAuth provider in /admin/oauth-providers; confirm AgentCore credential provider mirrors changes.
User: connect / reconnect a provider from Settings → Connectors and verify popup → finalize → "Connected" badge.
Agent (cold path): with no consent given, send a prompt that triggers an OAuth-gated tool; consent banner appears, popup → consent → tool runs and answer streams in the same turn.
Agent (stale token): revoke at the provider (e.g. https://myaccount.google.com/permissions), retrigger the tool, confirm the 401-detected reauth path fires (AfterToolCallEvent → retry → fresh consent → resume).
Backend unit tests: uv run python -m pytest tests/agents/main_agent/integrations tests/agents/main_agent/session/test_oauth_consent_hook.py tests/agents/main_agent/integrations/test_oauth_token_cache.py — should be green.
Frontend unit tests: npm run test:ci — should be green.

🤖 Generated with Claude Code

…middleware First phase of the Connectors refactor, which will eventually replace the bespoke OAuth token store (OAuthTokenRepository, KMS-encrypted DynamoDB, Secrets Manager client credentials, manual refresh) with AgentCore Identity's managed token vault and credential providers. - AgentCoreContextMiddleware copies the four Runtime headers (WorkloadAccessToken, OAuth2CallbackUrl, session ID, request ID) into BedrockAgentCoreContext on every invocation. Required because the Inference API is a plain FastAPI app rather than BedrockAgentCoreApp, so the SDK does not populate the context for us. No-op when headers are absent, so local development and unit tests continue to work without mocks. - AgentCoreIdentityClient wraps IdentityClient.get_token() with a narrower, platform-friendly surface for USER_FEDERATION (3LO) flows. Surfaces the "user consent required" case as a structured TokenResult(authorization_url=...) rather than an exception, so it can flow through the existing SSE stream as a new event type in a later phase. Both modules are pure additions; no existing code path calls them yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Wires the Runtime context middleware into the Inference API and swaps the external MCP client's token source from the bespoke OAuthService to AgentCore Identity's USER_FEDERATION flow. - main.py: installs AgentCoreContextMiddleware so WorkloadAccessToken and OAuth2CallbackUrl Runtime headers populate BedrockAgentCoreContext on every invocation. - external_mcp_client.py: _get_oauth_token now returns a TokenResult from AgentCoreIdentityClient instead of a decrypted token string from OAuthService. Scopes are read from the platform's OAuth provider record so organizations can change them without code. When the SDK signals that user consent is required, the authorization URL is stashed per-user for the inference route to surface via an oauth_required SSE event (emitter to follow in a subsequent commit). load_external_tools skips client creation on consent-required rather than creating a client that would fail at the first request. - Convention: the platform's provider_id is used verbatim as the AgentCore Identity credential-provider name. Admins register matching names via CreateOauth2CredentialProvider during provider setup. The OAuthService, token vault, and encryption layer are still referenced by unrelated code paths (admin routes, connections UI) and will be removed in Phase 3 once the AgentCore-backed flow is validated end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Rebrand the user-facing OAuth UI from "connections" to "connectors" for consistent vernacular across the product. Folders, classes, types, and route paths all follow the new name; the /settings/connections URL redirects to /settings/connectors. The backend /oauth/connections endpoint is preserved as a stable contract and translated at the service layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Wraps bedrock-agentcore-control for admin-side OAuth2 credential provider CRUD: create/update/delete/get with vendor mapping (Google/Microsoft/GitHub to their native vendors; Canvas/Custom routed through CustomOauth2 via an OIDC discovery URL or explicit authorization-server metadata). Domain errors map 404/conflict/invalid-custom to typed exceptions so route handlers can translate cleanly. Update is intentionally non-partial: AgentCore's UpdateOauth2CredentialProvider requires a full oauth2ProviderConfigInput and Get never returns the stored client_secret, so credential rotation always re-submits both clientId and clientSecret. 17 unit tests cover every vendor path, error mapping, and the Custom-only discovery rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds Create/Update/Delete/Get/List on bedrock-agentcore OAuth2 credential providers to the app-api task role, scoped to the default token vault. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Deletes the legacy 3LO dance that predates AgentCore Identity — the per-user token vault, PKCE-based authorization service, encryption layer, token cache, user-facing /oauth/* routes, and the tool-side OAuthToolService. AgentCore Identity owns the token vault and consent flow now; the inference path already routes through agentcore_identity.py via the recent external MCP client refactor, so these modules had no live consumers. Also slims shared/oauth/__init__.py to the surviving surface (provider model, repository, registrar) and unwires the user-facing router from app_api/main.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AgentCore Identity owns the clientId, clientSecret, endpoint config, and callback URL. Our DynamoDB record keeps only the admin metadata (display name, scopes, role gates, icon) plus cached pointers to AgentCore's record (credential_provider_arn, callback_url) for convenience. Drops authorization_endpoint, token_endpoint, authorization_params, userinfo_endpoint, revocation_endpoint, pkce_required, OAuthUserToken, and the user-side connection DTOs — all artifacts of the retired in-house flow. Adds oauth_discovery_url and authorization_server_metadata for Custom/Canvas providers, gated by a pydantic validator. Repository surface tightens to put_provider + apply_metadata_update; the Secrets Manager write/read path is gone. Admin routes (commit next) own the AgentCore round-trip and hand a fully-formed record to the repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

POST now calls the registrar first and, on success, upserts the metadata record in DynamoDB. If the DB write fails after AgentCore has accepted the credentials, we best-effort delete the AgentCore provider to avoid orphans. PATCH distinguishes metadata-only edits (scopes, roles, display name, icon, enabled) from credential rotation. Rotation requires clientId + clientSecret together — partial updates are rejected by AgentCore's UpdateOauth2CredentialProvider contract. DELETE removes the AgentCore provider first (which revokes every user token stored in its vault), then the local record. Pre-existing connection- count checks are dropped since per-user tokens no longer live in our DB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Admin side: - Rename admin/oauth-providers → admin/connectors (file + route); old route path redirects for URL stability - Rewrite the admin model to the AgentCore-owned shape: drop endpoint fields, authorization_params, pkce_required, userinfo/revocation endpoints. Add credential_provider_arn, callback_url, and oauth_discovery_url / authorization_server_metadata for Custom vendors - Rewrite the admin form: preset picker simplified to display metadata only, Custom requires an OIDC discovery URL, credential rotation requires clientId + clientSecret together (AgentCore's update API is not partial), success screen after create displays the AgentCore callback URL with a copy button so the admin can paste it into the vendor console, edit mode shows the callback URL + ARN read-only User-facing retirement: - Delete settings/connectors (user "my connected accounts" page), settings/oauth-callback (legacy 3LO return handler), and the sidebar + route entries for them. AgentCore Identity owns the consent flow at runtime via the existing /oauth-complete landing page Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When an external MCP tool needs OAuth consent, AgentCore Identity returns an authorization URL instead of a token. This wires that signal all the way to the user: Backend: - Inference route drains pending consent URLs from the external MCP integration after the agent stream finishes and emits one oauth_required SSE event per provider before done - IAM grants bedrock-agentcore:GetResourceOauth2Token on the runtime role so the AgentCore Identity client can reach the token vault - CLAUDE.MD + SSE_ERROR_MESSAGING.md document the new event Frontend: - Stream parser recognizes oauth_required and surfaces it as an OAuthRequiredEvent - New /oauth-complete landing page handles the AgentCore callback redirect and postMessages consent completion to the opener tab - OAuthConsentService orchestrates popup opening + postMessage receipt - OAuthConsentBanner renders the Connect button inside the chat input - chat-http and assistant preview pass OAuth2CallbackUrl header so AgentCore Runtime knows where to return after consent Also updates the admin Tool form reference from /admin/oauth-providers to /admin/connectors to match the renamed admin surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…izer Adds the Settings → Connectors page so users can browse and connect OAuth-backed external tools end-to-end: - New /connectors routers on app-api (list user-visible providers via RBAC) and inference-api (initiate-consent, complete-consent) — the inference-api side runs under the AgentCore Runtime proxy where the WorkloadAccessToken context is populated. - AgentCoreIdentityClient gains a workload-token mint fallback for local dev (GetWorkloadAccessTokenForUserId) and appends provider_id to the callback URL so the landing page can dismiss the right banner. - /oauth-complete page POSTs CompleteResourceTokenAuth back through the inference-api before notifying the opener, fixing the "consent finished but vault stayed empty" race. Uses BroadcastChannel to bridge popup → opener under Chrome's COOP isolation. - New connectors settings page with a Connect / Reconnect affordance per provider, wired to the OAuthConsentService popup flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… interrupts The agent used to pre-flight OAuth at tool-load time and abort the whole turn if any provider needed consent — the user then had to retype the prompt after authorizing. This switches to the Strands interrupt protocol: the consent gate runs lazily before each tool call, pauses the in-flight turn, and resumes it automatically once the user finishes the popup. Backend - New OAuthConsentHook (BeforeToolCallEvent + AfterToolCallEvent). - BeforeToolCall: looks up the OAuth provider for the selected MCPAgentTool's MCPClient (no name coupling), checks the in-process token cache, and either lets the tool run or calls event.interrupt(...) with the consent URL when AgentCore Identity reports consent required. - AfterToolCall: detects 401-style failures from MCP tool results, marks the (user, provider) for force_authentication on the next fetch, and sets event.retry = True so the BeforeToolCall hook re-fires and triggers a fresh consent. Closes the gap where a provider-side revocation leaves a stale token in AgentCore's vault. - New oauth_token_cache: per-(user, provider) tokens + force-reauth flags; lifecycle-managed by the hook. - ExternalMCPIntegration always loads MCP clients with a lazy token_provider that reads from the cache; the pending_consent / drain_pending_consent dict and the route's pre-LLM short-circuit branch are gone. - StreamCoordinator emits one oauth_required SSE event per pending interrupt before the final done event, carrying interruptId so the frontend can resume the same turn. - ChatAgent.stream_async accepts interrupt_responses and forwards them to Strands as the resume prompt; route accepts the same on /invocations and skips quota + RAG augmentation on resume. Frontend - OAuthRequiredEvent type + validator gain interruptId; settings-page consent path makes interruptId optional (no agent turn to resume). - OAuthConsentService tracks the interruptId per request and invokes a registered resume handler on broadcast success. - ChatRequestService snapshots the last turn's payload and replays it with interrupt_responses attached when a consent completes — the user never retypes the prompt. Smoke-tested end-to-end: Google revoke → whoami → 401 → AfterToolCall detects + retries → fresh consent banner → popup → auto-resume → tool returns greeting in the same turn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

+        logger.error(
+            "CompleteResourceTokenAuth failed for user=%s provider=%s: %s",
+            current_user.user_id,
+            body.provider_id,


philmerrell · 2026-04-22T23:38:29Z

Code Review

Large structural rewrite (+5,125 / −5,781 across 87 files) that replaces the in‑house OAuth flow with AWS Bedrock AgentCore Identity and moves OAuth consent from a pre‑flight gate to mid‑turn Strands interrupts. Architecture is cleaner (AgentCore owns secrets/vault; our DB owns metadata), the resume‑without‑retyping UX is a nice win, and tests look complete. A handful of real issues should be addressed before merging.

Critical Issues

#	File	Line	Issue	Severity
1	`backend/src/apis/inference_api/connectors/routes.py`	126–165	`complete_consent` forwards a user‑supplied `session_uri` to AgentCore's `CompleteResourceTokenAuth` without verifying the URI was issued for this user. If AgentCore doesn't strictly bind `sessionUri` to `userIdentifier`, a user who learns another user's `request_uri` (e.g. via browser logs, shared link) could land tokens in their own vault or complete another user's consent. Add server‑side tracking of initiated sessions, or explicitly verify AgentCore enforces this binding and document the assumption.	🔴 Critical
2	`backend/src/agents/main_agent/session/hooks/oauth_consent.py`	51–76	`_AUTH_FAILURE_PATTERN` matches partial words for `invalid[_\s-]token`, `expired[_\s-]token`, etc. with no `\b` boundary. A tool error whose text includes `/v1/401/...` or references "unauthorized" elsewhere will trigger force‑reauth and surface a spurious consent popup. Tighten to word boundaries and prefer `status_code`/explicit markers over regex scraping.	🟡 High
3	`backend/src/apis/shared/oauth/agentcore_registrar.py`	291	`response.get(\"clientSecretArn\") or {}` — if AgentCore ever returns this field as null or an unexpected shape, you get silent data loss or a runtime `AttributeError`. Add a type assertion so future API changes surface as real errors.	🟡 Medium
4	`backend/src/apis/app_api/admin/oauth/routes.py`	115–130	Create‑provider rollback on DB failure is best‑effort; if rollback also fails you have an orphaned AgentCore provider and no reconciliation. Either store a pending‑cleanup row to retry async, or at least emit a CloudWatch metric/alarm on the orphan case.	🟡 Medium
5	`backend/src/apis/inference_api/connectors/routes.py`	141–148	Inline `import boto3; import os` inside the handler, leaks access to the private `_RUNTIME_WORKLOAD_ENV` (never actually used), and constructs the boto3 client per request. Move imports to module scope and cache the client.	🟡 Medium

Suggestions

#	File	Line	Suggestion	Category
1	`backend/src/agents/main_agent/integrations/oauth_token_cache.py`	38	`def set(...)` shadows the built‑in. Rename to `put` or `store`.	Style
2	`backend/src/agents/main_agent/session/hooks/oauth_consent.py`	237	Dedup key `_oauth_reauth_attempted` could collide with a future Strands field. Use a module constant with a distinctive name.	Maintainability
3	`agentcore_identity.py` vs `provider_repository.py`	111 / 30	Default region inconsistency — `us-east-1` vs `us-west-2`. Centralize in one config module.	Correctness
4	`backend/src/agents/main_agent/integrations/external_mcp_client.py`	220	In‑process `self.clients` dict keyed by `user_id:tool_id` has no eviction. Long‑running Fargate tasks accumulate forever. Bound size (LRU) or evict on session end.	Performance
5	`backend/src/agents/main_agent/base_agent.py`	182–189, 302–304	New `ThreadPoolExecutor()` per tool lookup and per external‑tool load. Reuse a module‑level executor.	Performance
6	`backend/src/agents/main_agent/streaming/stream_coordinator.py`	204	`_extract_oauth_required_events` reads `agent._interrupt_state` — a private Strands attribute. Any SDK rename breaks OAuth silently. Add a try/except with a loud warning, or ask Strands for a public accessor.	Maintainability
7	`frontend/ai.client/src/app/oauth-complete/oauth-complete.page.ts`	266	`this.config.inferenceApiUrl().replace(/\/invocations\/?$/, '')` is fragile. Add a dedicated `inferenceApiBaseUrl()` config or compose from a known base + path.	Maintainability
8	`CLAUDE.MD`	—	Filename `CLAUDE.MD` (not `CLAUDE.md`) diverges from the path used elsewhere on case‑sensitive filesystems. Pre‑existing but worth renaming.	Style
9	`backend/src/apis/app_api/admin/oauth/routes.py`	176	"Discovery config can only be updated together with a credential rotation" — surface AgentCore's own constraint in the error message for operators.	UX
10	`frontend/ai.client/src/app/session/services/chat/chat-request.service.ts`	104	`this.lastRequestObject = { ...requestObject }` is a shallow copy. Nested arrays (`enabled_tools`, `file_upload_ids`) can mutate after snapshot. Use `structuredClone`.	Correctness

What Looks Good

Clean division of authority between AgentCore (secrets, vault, callback URL) and DynamoDB (display/RBAC/scopes), well documented in module docstrings.
OAuthConsentHook elegantly covers both cold‑start and stale‑token paths via BeforeToolCallEvent + AfterToolCallEvent, with a per‑turn retry guard.
BroadcastChannel + postMessage dual path for popup → opener handoff correctly anticipates COOP severing window.opener after the popup traverses external origins.
Resume protocol is thoughtful: snapshotting the prior request, sending an empty message with interrupt_responses, bypassing quota/RAG/file resolution on resume.
Test coverage is meaningful: test_oauth_consent_hook.py (+419) and test_agentcore_identity.py (+182) exercise the consent paths.
Frontend SSE parser correctly allows oauth_required after message_stop/done.
IAM scoping in inference-api-stack.ts:490 properly constrains GetResourceOauth2Token to the token vault and workload‑identity directory.

Verdict

Request Changes — merge‑blocking on items #1 (verify complete_consent authorization binding, or add server‑side session tracking) and #2 (tighten auth‑failure regex). Remaining items safe as follow‑ups.

CI: mergeStateStatus: UNSTABLE; "Test Python Code" on App API and Inference API workflows were still in progress at review time — verify they pass before merging.

🤖 Generated with Claude Code

…uth-failure regex Hardens two gaps called out in review of the AgentCore OAuth flow. - `/connectors/complete-consent` now verifies the submitted `session_uri` was issued to the authenticated user at `initiate_consent`, rejecting cross-user replay with 403 before ever calling AgentCore. Backed by a thread-safe TTL cache (10 min, single-use). Soft-fails with a warning when AgentCore's authorize URL doesn't carry a recognised session parameter, so an SDK shape change logs rather than blocks. - `_AUTH_FAILURE_PATTERN` tightened with word boundaries on every clause and a non-path guard on `401` so tool errors containing `/v1/401/...` no longer trigger a spurious force-reauth. Also moves `import boto3`/`os` out of the `complete_consent` handler body and caches the control-plane client via `lru_cache`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…back Addresses the remaining two critical items from PR #174 review. Registrar response parsing (`_info_from_response`): fails loudly on contract violations rather than silently storing empty strings. Missing `clientSecretArn` still tolerated (some vendors won't persist one) but a wrong-shape `clientSecretArn` or absent `credentialProviderArn` now raises TypeError so an AgentCore API change surfaces as a real error. Admin create-provider rollback (`_rollback_orphaned_provider`): now retries the AgentCore delete twice with backoff before giving up. On exhaustion, emits a CloudWatch `Agentcore/OAuth::ProviderOrphaned` custom metric so ops can alarm on stranded credential providers. Secondary failures (CW down, registrar down after retries) never shadow the admin's original 5xx — they only log. The subsequent create attempt that hits `CredentialProviderConflictError` with no DB record now returns an actionable 409 pointing at the AWS CLI cleanup command instead of a bare "already exists". App API task role grants `cloudwatch:PutMetricData` scoped to the `Agentcore/OAuth` namespace via a condition key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Reject non-https authorizationUrls at both intake and open time so a compromised backend can't smuggle javascript:/data: URIs into a user click. - Replace window.location.href hijack on popup-block with a blocked signal; the banner renders an "Open in new tab" anchor instead of tearing down the chat tab. - Reject resume requests whose interruptIds aren't present in the cached agent's _interrupt_state with 400, preventing silent acceptance after cache eviction, process restart, or forged payloads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CodeQL flagged the provider_id interpolation as clear-text logging of sensitive data — its taint analysis traces provider_id back through the OAuth credential path. The provider ID itself isn't secret, but the log line doesn't need it: tool_id already identifies the tool, and "(OAuth)" alone confirms auth was wired up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… tests Both tests codify behavior that commit b55653d intentionally retired: - TestInvocationsOAuthRequired exercised drain_pending_consent and the route-level oauth_required emission path. That path is gone — consent URLs now flow through Strands' _interrupt_state inside agent.stream_async (stream_coordinator.py:543), and the hook behavior is covered by tests/agents/main_agent/session/test_oauth_consent_hook.py. - test_missing_message_returns_422 expected message to be required, but InvocationRequest.message is now default "" so resume requests can reuse the original prompt from interrupt context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lated logic

…1 detection Fixes the constant Google re-auth bug: the consent hook was calling AgentCore Identity with `callback_url=None` whenever the inference API ran outside the Runtime proxy (every local-dev session). AgentCore then issued an authorize URL whose redirect went somewhere other than `/oauth-complete`, so consent never finalized and every request looped back through the consent flow. Adds a `CallbackUrlUnavailableError` and an `AGENTCORE_LOCAL_OAUTH_CALLBACK_URL` env-var fallback in `_resolve_callback_url`, so the failure mode is now loud instead of silent. Both the chat-triggered consent hook and the settings-page `initiate-consent` route catch it and return 503 with actionable guidance. Also tightens the OAuth 401 detection regex to reduce false-positive re-auth prompts: `\bunauthorized\b` now requires proximity to an HTTP/status/code keyword (previously matched prose like "unauthorized to view this calendar"), and adds high-confidence signals for OAuth `invalid_grant` (refresh-token revocation) and Google's `UNAUTHENTICATED` status / `invalid authentication credentials` message. Drops the in-process `session_cache` defence-in-depth on `complete-consent`: AgentCore's own `userIdentifier` ↔ `sessionUri` binding already rejects mismatched completions, and the local cache cost real operational pain (multi-worker / restart / `--reload` would break legitimate consent flows with a confusing 403). Trust the JWT-derived `current_user` plus AgentCore's binding instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Several user-facing connector improvements that share a foundation (per-user `force_reauth` lifecycle in the in-process token cache): - New `GET /connectors/{id}/status`: side-effect-free read that the settings page uses to render a "Connected" badge without committing the user to a consent flow (initiate-consent always triggers a server-side pending session). Honors the `force_reauth` flag — a just-disconnected user is reported as not connected even if the vault still holds an unexpired token. - New `DELETE /connectors/{id}/connection`: best-effort disconnect that flips the local `force_reauth` flag (AgentCore exposes no per-user vault-delete API). The next status check returns `connected: false`, the next initiate-consent passes `force_authentication=True`, and the user re-authorizes from scratch. complete-consent clears the flag on success so the UI flips back to connected without waiting on the agent loop to warm the cache. - Frontend Disconnect button on connected rows. Confirmation dialog uses the existing `ConfirmationDialogComponent` (CDK Dialog, destructive styling) — also swapped the admin connector-list delete from native `confirm()` to the same component for visual consistency. - Closed-popup recovery in `OAuthConsentService`: poll `popup.closed` after open and drop the provider from `inFlight` if the user dismisses without completing consent. The pending request stays so the chat banner re-offers Connect; the settings page resets `awaiting` → `idle` via the new `inFlightProviders` signal. - Settings page: loading skeleton in the row's action area while the status probe resolves, dropped the misleading "Reconnect" button (clicking it just hit `initiate-consent` and toasted "already connected"), and removed the scope-list display under each connector. - Forward Google's `access_type=offline` (per AgentCore Identity docs) via a new vendor-baseline helper, plumbed through both the chat-triggered consent hook and the settings/initiate-consent / status routes via two new optional lookups on `OAuthConsentHook` (`provider_type_lookup`, `custom_parameters_lookup`). Without this Google issues a 1-hour access token with no refresh path and the vault entry becomes unrefreshable. - Admin-configurable `custom_parameters` field on the OAuth provider record (DynamoDB `customParameters` map, Pydantic Create/Update/ Response, admin form `key=value` textarea with parse/serialize helpers). Merged with the vendor baseline at request time — baseline wins on conflict so admins cannot accidentally turn off documented vendor requirements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…aceholders Per the AgentCore Identity supported-providers docs, Slack, Salesforce, and Zoom are first-class vendors with pre-configured endpoints — admins only need to supply credentials. Verified the exact `credentialProviderVendor` strings and `oauth2ProviderConfigInput` keys against the SDK shape (`Oauth2ProviderConfigInput.members`): - Slack → SlackOauth2 / slackOauth2ProviderConfig - Salesforce → SalesforceOauth2 / salesforceOauth2ProviderConfig - Zoom → ZoomOauth2 / includedOauth2ProviderConfig (shared key for simpler vendors) Backend additions: `SLACK`, `SALESFORCE`, `ZOOM` on `OAuthProviderType`; vendor + config-key entries on the registrar. The existing discovery-URL guard correctly rejects discovery URLs for these new types. Frontend additions: matching `ConnectorType` literals; preset entries with sensible default scopes and vendor-relevant placeholder hints (e.g. Salesforce `api, refresh_token, offline_access, id, openid`); icon class branches for the new tiles (Slack fuchsia + chat bubble, Salesforce sky + cloud, Zoom blue + video camera). Form polish: - `scopesPlaceholder` / `customParametersPlaceholder` on each preset. Form binds them via computed signals so the hints update as the admin switches between providers. - Selecting a preset seeds `customParameters` only when the preset declares `defaultCustomParameters` — avoids clobbering user-typed content for presets that have only a hint. - Dropped the Google `defaultScopes`. The OIDC-only `openid email profile` set doesn't actually let an agent do anything useful with Google APIs (Calendar/Gmail/Drive each need different scopes), so the form lands empty and the placeholder shows the URL format as a hint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

+    logger.info(
+        "Completed OAuth consent for user=%s provider=%s",
+        current_user.user_id,
+        body.provider_id,


…lidation

…store Replaces the floating OAuth banner with an inline prompt anchored to the assistant turn that triggered consent, and persists pending interrupts to session metadata so a browser refresh rediscovers them instead of leaving the tool call orphaned in `pending` forever. Backend - New `PendingInterrupt` model on `apis.shared.sessions.models`; included on `MessagesListResponse` and `SessionMetadata`. - `metadata.add_pending_interrupt` / `remove_pending_interrupts` / `get_pending_interrupts` helpers using GSI lookup + targeted UpdateExpression. - `StreamCoordinator._extract_oauth_required_events` is now async and persists each interrupt before yielding the SSE event; failures log but never break the live stream. - `get_messages_from_cloud` fetches pending interrupts in parallel. - `/invocations` resume path clears resolved interrupts from metadata after `agent.stream_async` completes. - New `DELETE /sessions/{sid}/pending-interrupts/{iid:path}` endpoint for explicit dismiss; colon-bearing Strands ids preserved via `:path`. Frontend - New `OAuthConsentPromptComponent` with a refined inline card design, connector icon (admin base64 wins over heroicon, falls back to providerType default), eyebrow/lock motif, primary gradient action button, hover-revealed dismiss, fade+slide entrance. - `MessageMapService.loadMessagesForSession` hydrates pending interrupts on session load; anchors to triggering message id when present, else the most recent assistant message. - `OAuthConsentService.openConsentPopup` is async; lazy-fetches a fresh authorization URL via `initiate-consent` when the stored one is absent or expired (handles "already consented in another tab" by auto-resuming). - `OAuthConsentService.dismiss` syncs to backend by default; completion flow opts out so the resume path's own cleanup isn't double-fired. - `MessageListComponent` renders unanchored interrupts at end-of-list as a fallback for the "partial assistant message wasn't persisted" case. - `awaiting_auth` derived tool status renders as a primary-blue ring on the tool-rail dot instead of an indefinite amber spinner. - `ChatRequestService.resumeFromOAuthConsent` accepts a fallback session id (post-refresh case where `lastRequestObject` is null) and surfaces 400 `Unknown or expired interrupt ids` as a conversational error. - Old floating `OAuthConsentBannerComponent` removed. Known follow-up - First-turn-of-a-new-session OAuth: persistence currently no-ops because the session metadata row doesn't exist yet when the interrupt fires. Tracked separately; sidecar item or upsert pattern is the likely fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n handling

- Add functions to ensure session metadata existence and update session title and activity. - Implement logic for handling session activity updates, including message count increments and preferences merging. - Introduce deduplication for pending interrupts to prevent duplicate entries during session updates. - Update frontend components to reflect changes in session management, including OAuth consent prompts and message handling. - Refactor session service interfaces to use camelCase for consistency with backend responses. - Enhance tests for session activity updates, pending interrupts, and ensure proper handling of session metadata.

Resume after an OAuth-gated tool call only worked when the in-memory agent cache still held the original turn. After a browser refresh the frontend lost its request snapshot and the resume request landed with no enabled_tools / model_id, so the inference API rebuilt a fresh agent with an empty external-tool registry — the paused tool call had nothing to resume against and the LLM responded that the tool wasn't available. Resume contract now lives server-side. On pause, the stream coordinator captures a ``PausedTurnSnapshot`` (enabled_tools, model_id, provider, temperature, system_prompt, caching_enabled, max_tokens) onto the session row alongside the existing ``pendingInterrupts``. On resume, the inference API loads the snapshot and rebuilds the agent from it; Strands' SessionManager then restores ``_interrupt_state`` from AgentCore Memory, so the paused tool call picks up where it left off regardless of cache hit/miss, refresh, or pod restart. Frontend ``lastRequestObject`` snapshotting is gone — the resume payload is now ``{ session_id, message: '', interrupt_responses }``. Server-side snapshot has a 1h TTL; cleared on full turn completion and at the start of any new (non-resume) turn. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n't fail a turn Previously, ``load_external_tools`` cached newly-created MCP clients without verifying the server was actually reachable. A single connector that wasn't running locally (or whose endpoint was misconfigured) would sit in the registry and fail the whole turn the first time Strands called ``load_tools()`` on it. Pre-flight each new client immediately after construction. On failure, log a warning, skip the tool, and continue — the user keeps their other tools. On success the call also primes the client's tool cache, so Strands' later ``load_tools()`` becomes a no-op. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

+    """
+    user_id = current_user.user_id
+
+    logger.info("DELETE /sessions/%s/pending-interrupts/%s", session_id, interrupt_id)


+    """
+    user_id = current_user.user_id
+
+    logger.info("DELETE /sessions/%s/pending-interrupts/%s", session_id, interrupt_id)


+            if not snapshot:
+                logger.warning(
+                    "Resume rejected: no paused_turn snapshot for session %s",
+                    input_data.session_id,


+            if expires_at and datetime.now(timezone.utc) > expires_at:
+                logger.warning(
+                    "Resume rejected: paused_turn snapshot expired for session %s",
+                    input_data.session_id,


+                    detail="Paused turn expired; restart the turn.",
+                )
+
+            caching_enabled = snapshot.caching_enabled


ensure_session_metadata_exists() now runs unconditionally on /invocations and raises when DYNAMODB_SESSIONS_METADATA_TABLE_NAME is unset, breaking route tests that mock the agent and skip DynamoDB. Stub it via an autouse fixture so route tests exercise the route, not the persistence layer. Also patch the new get_pending_interrupts call in the cloud-message tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The disconnect flag lived in a module-level set inside the inference API process, so a /disconnect on one replica was invisible to any other. Under multi-replica deploys the user could see "Connected" on one request and "needs consent" on the next, and the AfterToolCallEvent 401-retry path likewise lost its intent on replica fan-out. Move the per-(user, provider) disconnect flag to a new OAuthDisconnectRepository on the existing oauth-user-tokens DynamoDB table (already provisioned, KMS-encrypted, with R/W IAM granted to the inference API). The token cache stays as a hot-path L1 for tokens only; the consent hook reads the disconnect repo on every BeforeToolCallEvent so a disconnect anywhere is honored on the next tool run anywhere. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…wlist The frontend posts an `OAuth2CallbackUrl` header on every consent-related request, and the inference-api middleware was forwarding it verbatim into `BedrockAgentCoreContext`. An authenticated user could pivot the OAuth redirect to an attacker-controlled origin and capture the authorization code on consent. Reuse `CORS_ORIGINS` as the trust boundary, pin the path to `/oauth-complete`, and reject non-http(s) schemes, query strings, and fragments. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

A misconfigured provider (wrong scope, perma-401) would otherwise spawn a fresh consent prompt on every tool call in a turn: the per-tool-use retry guard reset for each new toolUseId, so the model could trigger prompt-after-prompt with no upper bound. Track attempted providers on the hook itself, reset on `BeforeInvocationEvent` (fires per turn, including resume), so the user sees at most one consent prompt per provider per turn before 401s flow through to the model. Also clarify the `event.interrupt(name="oauth:{provider_id}")` comment: the SDK's BeforeToolCallEvent._interrupt_id folds in `toolUseId`, so parallel tool calls to the same provider already produce distinct interrupt ids. New regression test pins that invariant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

A stream replay after refresh, or a late server-side breadcrumb clear, could fire the same `oauth_required` event again after a successful consent or explicit dismissal — and the prompt would resurrect because provider-keyed dedup re-added the entry. Track seen interrupt ids on the consent service so already-resolved interrupts stay gone for the session. New tool calls always carry a fresh interrupt id (Strands generates it from `toolUseId`), so legitimate prompts are never suppressed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

+            return
+        result = self._mark_disconnected(provider_id)
+        if inspect.isawaitable(result):
+            await result


… stack The referenced tables live in InfrastructureStack (moved there to break a prior circular dep); update 9 SSM-read comments to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

philmerrell and others added 13 commits April 22, 2026 10:55

chore(connectors): grant IAM for credential-provider admin ops

2961906

Adds Create/Update/Delete/Get/List on bedrock-agentcore OAuth2 credential providers to the app-api task role, scoped to the default token vault. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: gitignore .claude/scheduled_tasks.lock

4657daf

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-advanced-security AI found potential problems Apr 22, 2026

View reviewed changes

Comment thread backend/src/apis/inference_api/connectors/routes.py

logger.error(

"CompleteResourceTokenAuth failed for user=%s provider=%s: %s",

current_user.user_id,

body.provider_id,

Comment thread backend/src/apis/inference_api/connectors/routes.py Fixed

philmerrell and others added 2 commits April 22, 2026 17:46

github-advanced-security AI found potential problems Apr 22, 2026

View reviewed changes

Comment thread backend/src/apis/inference_api/connectors/routes.py Fixed

Comment thread backend/src/apis/inference_api/connectors/routes.py Fixed

philmerrell and others added 3 commits April 22, 2026 21:00

philmerrell assigned colinmxs Apr 23, 2026

philmerrell and others added 4 commits April 25, 2026 07:44

feat(connectors): implement tool-config freshness cache and update re…

bcef3b4

…lated logic

github-advanced-security AI found potential problems Apr 25, 2026

View reviewed changes

Comment thread backend/src/apis/inference_api/connectors/routes.py

logger.info(

"Completed OAuth consent for user=%s provider=%s",

current_user.user_id,

body.provider_id,

philmerrell and others added 3 commits April 25, 2026 13:22

feat(connectors): add support for optional base64 icon uploads and va…

1622d20

…lidation

feat(connectors): add OAuth consent prompt component for authorizatio…

94fcf24

…n handling

philmerrell and others added 3 commits April 25, 2026 21:28

github-advanced-security AI found potential problems Apr 26, 2026

View reviewed changes

philmerrell and others added 6 commits April 26, 2026 15:52

fix: Update oauth consent prompt styling

21d96f1

github-advanced-security AI found potential problems Apr 27, 2026

View reviewed changes

Comment thread backend/src/agents/main_agent/session/hooks/oauth_consent.py

return

result = self._mark_disconnected(provider_id)

if inspect.isawaitable(result):

await result

chore(infra): correct stale "App API Stack" comments in inference-api…

9cae6e2

… stack The referenced tables live in InfrastructureStack (moved there to break a prior circular dep); update 9 SSM-read comments to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

philmerrell merged commit d04f808 into develop Apr 27, 2026
33 checks passed

philmerrell mentioned this pull request Apr 27, 2026

fix: grant CreateTokenVault and wire OAuth providers table to app-api #179

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: external MCP connectors via AgentCore Identity#174

feat: external MCP connectors via AgentCore Identity#174
philmerrell merged 35 commits intodevelopfrom
feature/connectors

philmerrell commented Apr 22, 2026

Uh oh!

Uh oh!

philmerrell commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

philmerrell commented Apr 22, 2026

Summary

Test plan

Uh oh!

Uh oh!

philmerrell commented Apr 22, 2026

Code Review

Critical Issues

Suggestions

What Looks Good

Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants